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Abstract — In any Micro-Controller Embedded System the whole memory is accessed by a single Micro- Controller. It 
uses only a fraction of memory and rest of the memory is wasted. In addition to this the only one Micro-Controller is 
executing all the given set of instructions or program. In our work we have designed a Multi-Block external memory 
& Binary-Tree-Decoder for a Multi-Micro-Controller Embedded System so that the complete memory is divided into 
more than one block (in our case it is 4) and it can be accessed independently by more than one Micro-Controller. 
This can be done in two ways one is static memory mapping mechanisim and dynamic memory mechanisim. In static 
memory mapping, the Micro-Controller is able to access only a particular block of memory to which it is mapped. 
While in the Dynamic memory mapping, any Micro-Controller can access any block of memory. Also, the different 
part program is executed by different Micro-Controller parallely, which results in to speed up the execution speed of 
the Multi-Microcontroller system. Current embedded applications are migrating from single processor -based 
systems to intensive data communication requiring multi-processing systems to fulfill the real time application 
demands. The performance demanded by these applications requires the use of multiprocessor architecture. For 
these types of multiprocessor systems there is a need for developing such memory mapping mechanism which can 
support high speed. For selected memory mapping mechanism what should be the decoding mechanism and the 
controller design that gives low-power consumption, high-speed, low- area system. Our alorithtim of Binary-Tree- 
Decoder improves the MMSOPC embedded system design. However the designing of Binary-Tree-Decoder alorithim 
(for 256K memory) has not been designed by any of researcher which is presented in this work. 
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I. INTRODUCTION 

Current embedded applications are migrating from single processor-based systems to intensive data 
communication requiring multi-processing systems to fulfill the real time application demands. The performance demanded 
by these applications requires the use of multiprocessor architecture.For these type of multiprocessor systems there is a need 
for developing such memory mapping mechanism which can support high speed.For selected memory mapping mechanism 
what should be the decoding mechanism and the controller design that gives low power consumption, high speed , low area 
system. The designing of an on chip micro networks meeting the challenges of providing correct functionality and reliable 
operation of interacting system-on-chip components [1]. The dynamic partitioning of processing and memory resources in 
embedded MPSoC Architectures [3]. The idea of dynamic partitioning of memory resources is been used in designing the 
architecture of multi-block memory. The MPSoC design challenges, one of the key MPSoC architectural point while 
designing and programming is the memory access methods [20]. 

To acccess the multi-block memory we have used dynamic memory mapping technique and designed a controller 
for this external memory that is used in 8-bit Multi-MicroController-System-On-Chip[20][3]. 
In a multi-microcontroller SOCs there can be a possibility of two kind of memory mapping. 

• Static Memory mapping 

• Dynamic memory mapping 

A. STATIC MEMORY MAPPING: 

In static memory mapping, the micro-controller is able to access only a particular block of memory to which it is mapped. 



Microcontroller No. 


Memory Block address 


Memory Block No. 


MC01 


0000- 1FFF 


BLOCK0 


MC02 


2000-2FFF 


BLOCK 1 


MC03 


3000-3FFF 


BLOCK2 


MC04 


4000-4FFF 


BLOCK3 
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Fig. i Static memory mapping 



B. DYNAMIC MEMORY MAPPING: 

While in the Dynamic memory mapping any micro-controller can access any block of external memory, also the 
different program can be executed by different micro-controller parallely, which results in to high throuput & the performace 
of the multi-microcontroller system improves[19]. As Dynamic memory mapping gives more processing speed so we follow 
this technique[22]. But, this technique is very complex and requires extra hardware to implement this technique. An address 
decoding technique to generate memory address need to enchanced so that no extra burden on hardware. 

Microcontroller No. Memory block No. Control signal Status 



MC01 BlockO,Blockl,Block2,Block3 RD=0 & WR=1 READ 



MC02 


BlockO,Blockl ,Block2,Block3 


RD=0&WR=1 


READ 


MC03 


BlockO,Blockl ,Block2,Block3 


RD=0 & WR=1 


READ 



MC04 BlockO,Blockl,Block2,Block3 RD=0&WR=1 READ 



In the Dynamic memory mapping, any micro-controller can access any block of memory. 




Fig. 2 Dynamic memory mapping 
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II. MEMORIES IN MULTI-MICROCONTROLLER ENVIRONMENT 

There are different memory scheme available for SOCs having multiple Microcontrollers. Three of them are 
listed below[27]. 

• UMA (Uniform Memory Access) 

• NUMA (Non-Uniform Memory Access) 

• COMA (Cache Only Memory Access) 

• CC-NUMA (Cache Coherency Non Uniform Memory Access) 

• CC-COMA (Cache Coherency Cache Only Memory Access) 



A. UMA (Uniform Memory Access) Model: 

Physical memory is uniformly shared by all the processors. All the processors have equal memory access time to 
all memory words, so it is called uniform memory access. 
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Fig. 3 UIvIA. memory mo dal for a Ivlulti-Ivlicro c ontr oiler SOC 

Each processor may use private cache.Multiprocessor are called tightly coupled systems, due to the high degree 
of resource sharing. The system interconnect takes the form of a common bus, crossbar switch or a multistage network. 
Synchronization and communication among processors are done through using Share variables in the common memory. 
When all the processors have equal access to all peripherals devices, the system is called a systematic multiprocessor. Here 
all the processors are equally capable of running the executive programs such as operatinsystem kernel and I/O service 
routines. When only one or subsets of processors are executive-capable, the system is called Asymmetric multiprocessor. 
An executive or a master processor can execute the operating system and handle I/O. The remaining processors, having no 
I/O capability are called Attached Processors (APs) which execute codes under the supervision of the master processors. 



B. NUMA (Non-Uniform Memory Access) Model: 

NUMA multiprocessor is a shared-memory system where access time varies with the location of the memory word. 
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Fig.4 NUMA model for multiprocessor system 



The shared memory is physically distributed to all processors, called local memory. The collection of local 
memories from a global address space is accessible by all processors. To access a local memory with a local processor is 
faster than to access the remote memory attached to other processors due to added array through the interconnection 
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network. Besides distributed memories, globally shared memory can be added to a multiprocessor system. In this case, there 
are three memory-access patterns: 

(a) Local memory access (Fastest) 

(b) Global memory access. 

(c) Remote memory access (slowest) 

In hierarchically structured multiprocessors, Processors are devided into clusters which are itself an UMA or a 
NUMA multiprocessor. Clusters are connected to global shared-memory modules. All processors belonging to the same 
cluster are allowed to uniformly access the cluster shared-memory modules. All the clusters have equal access to the 
global memory. 

C. COMA (Cache Only Memory Access) Model: 

A multiprocessor using cache-only memory assumes the COMA model. This model is a special case of a NUMA 
machine, in which the distributed main memoris are converted to caches. There, is no memory hierarchy at each processors 
node. All the caches memory is accessable from a global address space. Remote cache access is assisted by the distributed 
cache directories (D). 

D. Other Model: 

Another variation of multiprocessors is a cache-coherent non-uniform memory access (CC-NUMA) model 

which is specified with distributed shared memory and cache directories. 

One more variation may be a cache-coherent COMA machine where all copies must be kept consistent. 
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Fig. 5 COMA model of multiprocessor 



4 



A Design Space Exploration of Binary-Tree-Decoder For Multi-Block- External-Memory-Controller . 




III. SOC MODEL AND DESIGN 

BLOCK DIAGRAM OF SYSTEM 
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Fig 6 .Generic Architecture of Multi- micro cntroller 

This is the generic system architecture of the Multi- microcontroller system on chip. In our project the multi block 
external memory is basically the integrated form of the following components: 

1 . MAR (Memory Address Register) 

2. MDR (Memory Data Register) 

3. MMBAC(Multi Memory Block Arbitration Controller) 

4. BINARY TREE DECODER 

5. Memory Block comparator 

6. Memory of 256 Kb 

In the above shown architecture the four microcontroller are connected to the memory via different controllers and 
buffer (register like MAR, MDR) with the help of address bus , data bus and control bus. Buses are nothing but the set of 
parallel wires that connect two components. In this architecture the data buses are of 8Bits, address buses are of 16Bits wide. 

A. BINARY TREE DECODER 

In the above implementation the binary tree decoder is the block which helps to implement the low power 
dynamic memory accesses by keeping in mind the low power and area. 
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Fig.7. Decoding Mechanism of Binary Tree Decoder 



After Decoding Mechanism of Binary Tree Decoder, Binary Tree generation and memory address generation 
for all blocks of memroy follows an algortihm which is as follows: 

for i in 0 to 255 loop 

if conv_integer(BinaryTreeDecoder_En) = i then 

—ForestTreeDecoderjOut := 16*i + conv_integer(y); Old Algo. 

BinaryTreeDecoderjOut := 256*i + conv_integer(y); New Algo. 

B .RTL Schematic of Decoding Mechanism of Bianry Tree Decoder 




Fig. 8: RTL Schematic of Binary Tree Decoder (BTD) 
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Fig.8: RTL schematic of integrated block 
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Fig.9: RTL schematic of MULTI-BLOCK-EXTERN AL-Memory-Controller IN Multi-uC SOC DESIGN 
C. Results 

The memory controller design using binary tree decoder is implemented on FPGA board and below table 
summarises the result. 
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C.l. Design summary of Binary tree decoder 



MEPSOPC Project Status 


Project File: 


MEPSOPCise 


Current State: 


Synthesized 


Module Name: 


BinaryTreeDecoder 


* Errors: 


No Errors 


Target Device: 


nc2s600e-6fg456 


• Warnings: 


2 Warnings \2 new, 0 filtered] 


Product Version: 


ISE 8.21 


• Updated: 


Wed Feb 1 1 5:54:55 2012 



r 



MEPSOPC Partition Summary 



No partition information was found. 



Device Utilization Summary (estimated values) 



Logic Utilization 


Used 


Available 


Utilization 


Number of Slices 


11 


6912 


0% 


Number of 4 input LUTs 


20 


13824 


Q°4 


Number of bonded lOBs 


34 


329 


W4 


Number of BR AM s 


1 


72 


V4 


Number of GCLKs 


2 


4 


W4 



Report Name 


Status 


Generated 


Errors 


Warnings 


Infos 


Synthesis Report I Current Wed Feb 1 15:43:08 

| 2012 


0 


2 Warninqs \2 new, 0 
filtered] 


3 Infos [3 new, 0 
filtered] 


Translation Report 












Map Report 












Place and Route 
Report 












Static Timing Report 












Bitgen Report 













Secondary Reports 


Report Name 


Status 


Generated 


Xplorer Report 







C.2.Comperative Area Analysis of Binary Tree Decoder and Forest Tree Decoder 



Sr. 
No. 


Parameter 

(64KB Memory Block) 


Forest Tree Decoder 


Binary Tree Decoder 


Comments 


1 


No. of Decoder 


4096 


256 


16times less than the 
Forest Tree Decoder 


2 


Synthesis Time/Issues 


Unable to finish 
synthesis, tool crashes 
due to requirement of 
large virtual memory 
space 


Able to synthesize faster 




3 


Area 


More number of gates 


Less number of gates 
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CJ.Power Anal ysis of Binary Tree Decoder 



S.No. 


Performance metric (power) 




Type 


Voltage 


Current 


Power 






(V) 


(mA) 


(mW) 


1 


Vccint 


1.8 


15 


27 


2 


Vcco33 


3.3 


2 


6.6 




Total Power 


33.6 



C. 4. Results from X-Power summary report of the integrated design 



Voltage 


Total Current 


Total Power 


Vdd=3.3V, Vt=1.8V 


17mA 


33.6mW 



IV. CONCLUSION 

Our result is tested, verified & prototped on the following environment using following Hardware & software tools 
& technology: 

xc2s600e-6fg456 
50 MHz 
12MHz 

3.3 volts 

1.4 to 1.6 volts 
-40°C to 80°C 
-6 

Xilinx ISE 8.2i 
ModelSimXE-ii 5.7g 



Target Device: 
On Board frequency: 
Test frequency: 
Supply Voltage: 
Worst Voltage: 
Temperature: 
Speed Grade: 
FPGA Compiler: 
Simulator used: 



The initial implementation was based on forest tree decoder. The major enhancement done in this work is to 
replace Forest Tree Decoder with Area, power efficient implementation of Binary Tree Decoder. The decoding process is 
one of the most time consuming process, this decoding mechanism in turn makes the whole MPSOC system faster and 
efficient with help of multi block memory architecture. With reference to the 2d VLSI complexity model we have found that 
as x-axis (data lines) width is kept fixed (8 bit) and y-axis (row address of memory) height is reduced in case of binary tree 
decoder in comparison to forest tree decoder.So the over all area of 2D model is reduced which in turn increases the 
accessing speed of the multi-processor system. As the y-axis height is reduced this implies that searching time of particular 
memory word (8 bit here) is reduced hence accessing speed is increased. In another aspect, as the area is lesser and the 
power consumed is also lesser. 
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